CIE: Use a faster cbrtf implementation
This is the approximate cube root of an IEEE float implementation from
Hacker's Delight. The elimination of all conditional branches probably
makes it a better candidate for future SIMD accelerated code paths.
On an Intel i7 Haswell, it now takes 0.27s to convert a 15 megapixel
buffer from "RGBA float" to "CIE Lab alpha float" instead of the
earlier 0.35s. A "Y float" to "CIE L float" conversion takes 0.085s
instead of 0.102s.
Original code: http://www.hackersdelight.org/hdcodetxt/acbrt.c.txt
Permissions: http://www.hackersdelight.org/permissions.htm
https://bugzilla.gnome.org/show_bug.cgi?id=791837